Distractorless Authorship Verification
نویسندگان
چکیده
Authorship verification is the task of, given a document and a candidate author, determining whether or not the document was written by the candidate author. Traditional approaches to authorship verification have revolved around a “candidate author vs. everything else” approach. Thus, perhaps the most important aspect of performing authorship verification on a document is the development of an appropriate distractor set to represent “everything not the candidate author”. The validity of the results of such experiments hinges on the ability to develop an appropriately representative set of distractor documents. Here, we propose a method for performing authorship verification without the use of a distractor set. Using only training data from the candidate author, we are able to perform authorship verification with high confidence (greater than 90% accuracy rates across a large corpus).
منابع مشابه
Authorship Verification based on Syntax Features
Authorship verification is wildly discussed topic at these days. In the authorship verification problem, we are given examples of the writing of an author and are asked to determine if given texts were or were not written by this author. In this paper we present an algorithm using syntactic analysis system SET for verifying authorship of the documents. We propose three variants of two-class mac...
متن کاملProbabilistic Anomaly Detection Method for Authorship Verification
Authorship verification is the task of determining if a given text is written by a candidate author or not. In this paper, we present a first study on using an anomaly detection method for the authorship verification task. We have considered a weakly supervised probabilistic model based on a multivariate Gaussian distribution. To evaluate the effectiveness of the proposed method, we conducted e...
متن کاملAuthorship Identification in Large Email Collections: Experiments Using Features that Belong to Different Linguistic Levels - Notebook for PAN at CLEF 2011
The aim of this paper is to explore the usefulness of using features from different linguistic levels to email authorship identification. Using various email datasets provided by PAN’11 lab we tested several feature groups in both authorship attribution and authorship verification subtasks. The selected feature groups combined with Regularized Logistic Regression and One-Class SVMmachine learni...
متن کاملCross-Genre Authorship Verification Using Unmasking
This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date...
متن کاملText Categorization for Authorship Verification
Abstract. One common version of the authorship attribution problem is that of authorship verification. We need to determine whether a given author, for whom we have a corpus of writing samples, is also the author of a given anonymous text. The set of alternate candidates is not limited to a given finite closed set. In this paper we show how usual text categorization methods can be adapted to so...
متن کامل